Regex Golf
__NOINDEX__Regex Golf is a programming game where the goal is to write a regex which matches all the items in one list without matching any item in a second list, with the catch that the regex has to be as short as possible (otherwise it would only be necessary to exactly match each item in the first list, like /^(item|item|item|...)$/). The game is a descendant of code golf, a much older programming game where the goal is to write the shortest possible program that performs a given task. Background Regex golf appears to have been first introduced by the website regex.alf.nu sometime in 2013, but went unnamed until January 6, 2014, when Randall Munroe published xkcd #1313: "Regex Golf". With carefully chosen lists, regex golf might also serve as an example of nerd sniping, another concept introduced by Munroe in a much earlier xkcd strip, on December 12, 2007. xkcd In the original xkcd strip, two examples of regex golf are given: the first, M | TN|B, is said to match the subtitles of all Star Wars films without matching any of the subtitles of Star Trek films, and the second, bu|rnt|coye|mtga|j|iso|nhl|aed|lev|sh|lndi|poo|ls, is said to match the last names of elected United States presidents but not their opponents; both regexes are analyzed by Explainxkcd.com and an analysis won't be repeated here. regex.alf.nu The website regex.alf.nu takes the form of a series of programming challenges in which the reader is presented with two lists of words or phrases and an input box to type the regex in. Because this is an in-browser game, the ECMA flavor of regexes is used for solving levels. The left list is the one to be matched, and the right list is the one to be excluded; a score is given to the right of the input box. Each level has a maximum possible score assuming perfect matches and a regex with no characters. The score starts at 0, and each correct match adds ten points, each incorrect match subtracts ten points, and each character in the regex subtracts one point. Each level also has a hard mode, in which both lists are extended with additional items that are automatically generated (thus preventing brute-force solutions) to incorrectly match the player's regex if it is incorrect. Incorrect solutions cause 100 points to be subtracted from the player's score. Each of the following sections covers one level. A section starts with the explanation(s) given on the level, followed by a brief further explanation that gives the maximum and minimum possible score for the level (both of which assume a regex with zero characters; while a regex can't increase the score past the maximum, it can decrease it past the minimum). After this is a table of the word lists, and then a table of solutions with explanations and scores. The word lists and solutions are hidden in collapsed tables to allow the page to be read without spoiling any levels. Generally, only correct solutions of a minimum length are given, unless a longer or incorrect solution is interesting enough to merit inclusion. Warmup Warmup — Type a regex in the box. You get ten points per match (or lose ten, if you match something you shouldn't); each character costs one point. As the first level, this one has the simplest solution. Each list contains 21 items, for a maximum possible score of 210 points (and a minimum possible score of -210). Anchors Anchors — You are deducted one point per character you use, and ten if you match something you shouldn't. This is the first level to require regex features beyond simple text matching, and the first to have different minimal solutions for normal and hard mode. There are 21 items in each list, again giving a maximum score of 210 and a minimum score of -210. Ranges Ranges — The test vectors were generated by grepping /usr/dict/words. Can you tell? This level requires more advanced features of regexes to solve. Each list again has 21 items, resulting in maximum and minimum possible scores of 210 and -210, respectively. Backrefs Backrefs — This doesn't really work as a tutorial. Not really clear what you're supposed to do here. This is the first level that requires a solution which isn't a valid regular expression per the computer science definition. Each list has 21 items, for a maximum and minimum possible score of 210 and -210, respectively. Abba Abba — Let's pretend this one is not a rehash of the last one. This level requires a solution similar to the solution for "Backrefs", true to its description. It's the first level to have different numbers of items in the lists, with 21 items in the left list and 22 items in the right, for a maximum possible score of 210 points, but a minimum possible of -220. A man, a plan A man, a plan — You're allowed to cheat a little. Even in hard mode, words will be no longer than 13 characters. This is similar to the last level, but a somewhat simpler solution can be used. The left list has 19 items, making the maximum possible score 190, while the right list has 21 items, for a minimum possible score of -210. Prime Prime — The length is not part of the string. I should probably have chosen a different color. The items to be matched are all sequences of a prime number of "x"s; hard mode only adds a single-x item to the list of items not to be matched. There are 20 items in each list, but each item is worth 15 points instead of the normal ten, for a maximum possible score of 300 and a minimum possible score of -300. This is the first level to truncate words shown to the player due to length; levels are only truncated in this way when the truncated characters are obvious from the non-truncated portion (as with these items, which are obviously just sequences of "x"s). Four Four — You can get an extra point by ignoring the name of this level. This level again involves repeating matches, but is much more straightforward than previous levels. Each list has 21 items, for maximum and minimum possible scores of 210 and -210, respectively. Order Order — Cheat. This level is one that must be solved by ignoring the level's name and the obvious property of the items to be matched in order to get a good score, though hard mode prevents this strategy thanks to its open-ended word generation. Each list has 21 items, for a maximum and minimum possible score of 210 and -210, respectively. Triples Triples — Multiples of 7 are left as an exercise for the reader. Each list has 21 items, but each item in this level is worth 30 points, resulting in a maximum possible score of 630 and a minimum possible score of -630. Glob Glob — No test case will have more than three asterisks (and max total length of, let's say 100 why not) Each list has 21 items, though each item is worth 15 points in this level, for a maximum possible score of 320 and a minimum possible score of -320. Balance Balance — This one is also impossible, but there's a finite number of test cases (except in hard mode, where no input is more than seven levels deep) The left list has 33 items, and the right list has 32, with each item worth 15 points, for a maximum possible score of 495 points and a minimum possible score of -480 points. Powers Powers — In hard mode, you have to recognize any power of two. Each list has 11 items, for maximum and minimum possible scores of 110 and -110 points, respectively. Long count Long count — Interestingly, the left list has only a single item, while the right list has 20 items, and each item is worth 270 points, for maximum and minimum possible scores of 270 and -5400 points, respectively. Alphabetical Alphabetical — To save on typing the input will only use the characters a/e/n/r/s/t, even in hard mode Powers 2 Powers 2 — Or 3. This level is much the same as "Powers", except the number of "x"s to be matched is now a power of 3 instead of 2. Unusually, the left list has more items than the right: the left list has 11 items, for a maximum possible score of 110, while the right list has only 10 items, for a minimum possible score of -100. Interestingly, in normal mode, there is a nonperfect solution that scores higher than the best perfect solution.